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SPECTRAL THRESHOLDS IN THE BIPARTITE STOCHASTIC BLOCK MODEL 


LAURA FLORESCU AND WILL PERKINS 


Abstract. We consider a bipartite stochastic block model on vertex sets Vi and V 2 , with planted 
partitions in each, and ask at what densities efficient algorithms can recover the partition of the smaller 
vertex set. 

When IV 2 1 ^ ILiI, multiple thresholds emerge. We first locate a sharp threshold for detection 
of the partition, in the sense of the results of [32, 31 ] and [29] for the stochastic block model. We 
then show that at a higher edge density, the singular vectors of the rectangular biadjacency matrix 
exhibit a localization / delocalization phase transition, giving recovery above the threshold and no 
recovery below. Nevertheless, we propose a simple spectral algorithm. Diagonal Deletion SVD, 
which recovers the partition at a nearly optimal edge density. 

The bipartite stochastic block model studied here was used by [ 1 9] to give a unified algorithm for 
recovering planted partitions and assignments in random hypergraphs and random fc-SAT formulae 
respectively. Our results give the best known bounds for the clause density at which solutions can 
be found efficiently in these models as well as showing a barrier to further improvement via this 
reduction to the bipartite block model. 


1. Introduction 

The stochastic block model is a widely studied model of community detection in random graphs, 
introduced by [23]. A simple description of the model is as follows: we start with n vertices, 
divided into two or more communities, then add edges independently at random, with probabilities 
depending on which communities the endpoints belong to. The algorithmic task is then to infer the 
communities from the graph structure. 

A different class of models of random computational problems with planted solutions is that 
of planted satisfiability problems: we start with an assignment a to n boolean variables and then 
choose clauses independently at random that are satisfied by cr. The task is to recover a given the 
random formula. A closely related problem is that of recovering the planted assignment in [20] ’s 
one-way function, see Section 3.1. 

A priori, the stochastic block model and planted satisfiability may seem only tangentially related. 
Nevertheless, two observations reveal a strong connection: 

(1) Planted satisfiability can be viewed as a fe-uniform hypergraph stochastic block model, with 
the set of 2n booleans literals partitioned into two communities of true and false literals 
under the planted assignment, and clauses represented as hyperedges. 

(2) [19] gave a general algorithm for a unified model of planted satisfiability problems which 
reduces a random formula with a planted assignment to a bipartite stochastic block model 
with planted partitions in each of the two parts. 

The bipartite stochastic block model in [19] has the distinctive feature that the two sides of the 
bipartition are extremely unbalanced; in reducing from a planted fc-satisfiability problem on n vari¬ 
ables, one side is of size 0(n) while the other can be as large as 0(n^“^). 

We study this bipartite block model in detail, first locating a sharp threshold for detection and 
then studying the performance of spectral algorithms. 
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Our main contributions are the following: 

(1) When the ratio of the sizes of the two parts diverge, we locate a sharp threshold below which 
detection is impossible and above which an efficient algorithm succeeds (Theorems 1 and 
2). The proof of impossibility follows that of [32] in the stochastic block model, with the 
change that we couple the graph to a broadcast model on a two-type Poisson Galton-Watson 
tree. The algorithm we propose involves a reduction to the stochastic block model and the 
algorithms of [29, 31]. 

(2) We next consider spectral algorithms and show that computing the singular value decom¬ 
position (SVD) of the biadjacency matrix M of the model can succeed in recovering the 
planted partition even when the norm of the ‘signal’, ||EM||, is much smaller than the norm 
of the ‘noise’, ||M — EM|| (Theorem 3). 

(3) We show that at a sparser density, the SVD fails due to a localization phenomenon in the 
singular vectors: almost all of the weight of the top singular vectors is concentrated on a 
vanishing fraction of coordinates (Theorem 4). 

(4) We propose a modification of the SVD algorithm. Diagonal Deletion SVD, that succeeds at 
a sparser density still, far below the failure of the SVD (Theorem 3). 

(5) We apply the first algorithm to planted hypergraph partition and planted satisfiability prob¬ 
lems to find the best known general bounds on the density at which the planted partition or 
assignment can be recovered efficiently (Theorem 5). 

2. The model and main results 

The bipartite stochastic block model. Fix parameters (i G [0,2],ni < n 2 ,andp G [0,1/2]. Then we 
define the bipartite stochastic block model as follows: 

• Take two vertex sets Vi, V 2 , with | Vi| = ni, IV 2 I = n 2 . 

• Assign labels ‘-(-’ and ‘-’ independently with probability 1/2 to each vertex in Vi and ¥ 2 - 

Let a G denote the labels of the vertices in Vi and r G {zhl}"^^ denote the labels 

ofV 2 - 

• Add edges independently at random between Vi and V 2 as follows: for rt G Vi, r; G V 2 with 
cj(tt) = t{v), add the edge {u,v) with probability 6p', for a{u) 7 ^ t{v), add {u,v) with 
probability {2 — d)p. 

Algorithmic task: Determine the labels of the vertices given the bipartite graph, and do so with an 
efficient algorithm at the smallest possible edge density p. 

Preliminaries and assumptions. In the application to planted satisfiability, it suffices to recover cr, the 
partition of the smaller vertex set, Vi, and so we focus on that task here; we will accomplish that task 
even when the number of edges is much smaller than the size of V 2 - For a planted /c-SAT problem 
or /c-uniform hypergraph partitioning problem on n variables or vertices, the reduction gives vertex 
sets of size ni = 0 (n), n 2 = 0 (n^“^), and so the relevant cases are extremely unbalanced. 

We will say that an algorithm detects the partition if for some fixed e > 0, independent of ni, 
whp it returns an e-correlated partition, i.e. a partition that agrees with cr on a (1/2 -|- e)-fraction of 
vertices in Vi (again, up to the sign of a). 

We will say an algorithm recovers the partition of Vi if whp the algorithm returns a partition that 
agrees with cr on 1 — o(l) fraction of vertices in Vi. Note that agreement is up to sign as a and — cr 
give the same partition. 

2. 1. Optimal algorithms for detection. On the basis of heuristic analysis of the belief propagation 
algorithm, [15] made the striking conjecture that in the two part stochastic block model, with interior 
edge probability a/n, crossing edge probability b/n, there is a sharp threshold for detection: for 
(a — 6)^ > 2(a -|- h) detection can be achieved with an efficient algorithm, while for (a — 6)^ < 
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Figure 1. Bipartite stochastic block model on Vi and V 2 . Red edges are added 
with probability Sp and blue edges are added with probability (2 — 6)p. 

2(a + b), detection is impossible for any algorithm. This conjecture was proved by [32, 31] and 
[29]. 

Our first result is an analogous sharp threshold for detection in the bipartite stochastic block 
model at p = {5 — with an algorithm based on a reduction to the SBM, and a 

lower bound based on a connection with the non-reconstruction of a broadcast process on a tree 
associated to a two-type Gabon Watson branching process (analogous to the proof for the SBM [32] 
which used a single-type Gabon Watson process). 


Algorithm: SBM Reduction. 

(1) Construct a graph G' on the vertex set Vi by joining u and w if they are both connected to 
the same vertex n G V 2 and v has degree exactly 2. 

(2) Randomly sparsify the graph (as detailed in Section 5). 

(3) Apply an optimal algorithm for detection in the SBM from [29, 31, 8] to partition Vi. 


Theorem 1. Let 5 G [0, 2]\{1} be fixed and n 2 = uj{ni). Then there is a polynomial-time algorithm 
that detects the partition Vi = AiL) Bi whp if 

1 -h e 

{6 - l)2y/rein2 

for any fixed e > 0. 

Theorem 2. On the other hand, if 02 > ni and 

1 

[ 5 - l)2^nin2’ 

then no algorithm can detect the partition whp. 

Note that for p < it is clear that detection is impossible: whp there is no giant component 

in the graph. The content of Theorem 2 is finding the sharp dependence on <5. 
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2.2. Spectral algorithms. One common approach to graph partitioning is spectral: compute eigen¬ 
vectors or singular vectors of an appropriate matrix and round the vector(s) to partition the vertex set. 
In our setting, we can take the m x 77-2 rectangular biadjacency matrix M, with rows and columns 
indexed by the vertices of Vi and V 2 respectively, with a 1 in the entry {u, v) if the edge {u, v) is 
present, and a 0 otherwise. The matrix M has independent entries that are 1 with probability Sp or 
(2 — 6 )p depending on the label of u and v and 0 otherwise. 


Algorithm: Singular Value Decomposition. 

(1) Compute the left singular vector of M corresponding to the second largest singular value. 

(2) Round the singular vector to a vector 2 ; G {±1}”^ by taking the sign of each entry. 


A typical analysis of spectral algorithms requires that the second largest eigenvalue or singular 
value of the expectation matrix EM is much larger than the spectral norm of the noise matrix, (M — 
EM). But here we have ||M — EM|| = which is in fact much larger than A 2 (EM) = 

Q{py/nin2) when p = o(nj”^). Does this doom the spectral approach at lower densities? 

Question 1. For what values of p = p{ni, 77 - 2 ) is the singular value decomposition (SVD) of M 
correlated with the vector a indicating the partition of Vi ? 

In particular, this question was asked by [19]. We show that there are two thresholds, both well 
below p = 77]”^: at p = fi(77^ ^^^^^2 second singular vector of M is correlated with the 

partition of Vi, but below this density, it is uncorrelated with the partition, and in fact localized. 
Nevertheless, we give a simple spectral algorithm based on modifications of M that matches the 
bound p = 0((?7i? 72)“^/^) achieved with subsampling by [19]. In the case of very unbalanced 
sizes, in particular in the applications noted above, these thresholds can differ by a polynomial 
factor in 777 . 


Algorithm: Diagonal Deletion SVD. 

(1) Let B = MM^ — diag(MM^) (set the diagonal entries of MM^ to 0). 

(2) Compute the second eigenvector of B. 

(3) Round the eigenvector to a vector 2 ; G by taking the sign of each entry. 


Our results locate two different thresholds for spectral algorithms for the bipartite block model: 
while the usual SVD is only effective with p = 0(77^ the modified diagonal deletion 

algorithm is effective already atp = D(?7^ ^^^^2 which is optimal up to logarithmic factors. In 
particular, when 777 = 77,772 = 77 ^“^ for some A: > 3, as in the application above, these thresholds 
are separated by a polynomial factor in 77 . 

First we give positive results for recovery using the two spectral algorithms. 

Theorem 3. Let 772 > 777 log"^ 777, with 777 — )• 00. Let 6 G [ 0 , 2 ] \ { 1 } be fixed with respect to 777,772. 
Then there exists a universal constant C > 0 so that 

( 1 ) If p = 0(777772)“^'^^ log 777, then whp the diagonal deletion SVD algorithm recovers the 

partition V 7 = A 7 U Bi. 

— 2/3 — 1/3 

( 2 ) Ifp = Cn^ ' 772 ' log 777 , then whp the unmodified SVD algorithm recovers the partition. 

Next we show that below the recovery threshold for the SVD, the top left singular vectors are in 
fact localized', they have nearly all of their mass on a vanishingly small fraction of coordinates. 


Theorem 4. 
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Figure 2. Main theorems illustrated. 

Let 77-2 > ni log^ ni- For any constant c > 0, let p = crii t < n\^^, and r = ni/ logni. 

Let a = (Jjyfn\, and vi,V 2 , ■ ■ - Vt be the top t left unit-norm singular vectors of M. 

Then, whp, there exists a set S C {1,..., ni} of coordinates, |S| < r, so that for alll < i < t, 
there exists a unit vector Ui supported on S so that 

\\Vi - Ui\\ = o(l). 

That is, each of the first t singular vectors has nearly all of its weight on the coordinates in S. In 
particular, this implies that for all 1 < i < t, vt is asymptotically uncorrelated with the planted 
partition: 

\d ■ Vi\ = o(l). 

One point of interest in Theorem 4 is that in this case of a random biadjacency matrix of un¬ 
balanced dimension, the localization and delocalization of the singular vectors can be understood 
and analyzed in a simple manner, in contrast to the more delicate phenomenon for random square 
adjacency matrices. 

Our techniques use bounds on the norms of random matrices and eigenvector perturbation the¬ 
orems, applied to carefully chosen decompositions of the matrices of interest. In particular, our 
proof technique suggested the Diagonal Deletion SVD, which proved much more effective than the 
usual SVD algorithm on these unbalanced bipartite block models, and has the advantage over more 
sophisticated approaches of being extremely simple to describe and implement. We believe it may 
prove effective in many other settings. 

Under what conditions might we expect the Diagonal Deletion SVD outperform the usual SVD? 
The SVD is a central algorithm in statistics, machine learning, and computer science, and so any 
general improvement would be useful. The bipartite block model addressed here has two distinctive 
characteristics: the dimensions of the matrix M are extremely unbalanced, and the entries are very 
sparse Bernoulli random variables, a distribution whose fourth moment is much larger than the 
square of its second moment. These two facts together lead to the phenomenon of multiple spectral 
thresholds and the outperformance of the SVD by the Diagonal Deletion SVD. Under both of these 
conditions we expect dramatic improvement by using diagonal deletion, while under one or the 
other condition, we expect mild improvement. We expect diagonal deletion will be effective in the 
more general setting of recovering a low-rank matrix in the presence of random noise, beyond our 
setting of adjacency matrices of graphs. 

3. Planted A:-SAT and hypergraph partitioning 

[19] reduce three planted problems to solving the bipartite block model: planted hypergraph 
partitioning, planted random k-SAT, and Goldreich’s planted CSP. We describe the reduction here 
and calculate the density at which our algorithm can detect the planted solution by solving the 
resulting bipartite block model. 
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We State the general model in terms of hypergraph partitioning first. 

Planted hypergraph partitioning. Fix a function Q : {±1}^ —)■ [0,1] so that Qi^) — 

Fix parameters n and p G (0,1) so that max^; Q{x)2^p < 1. Then we define fhe planfed fc-uniform 
hypergraph parfifioning model as follows: 

• Take a verfex sef V of size n. 

• Assign labels ‘+’ and independenfly wifh probabilify 1/2 fo each verfex in V. Lef a G 
{±1}” denofe the labels of the vertices. 

• Add (ordered) /c-uniform hyperedges independently at random according to the distribution 

Pr(e) = 2^ • Q(cr(e)) 
where a(e) is the evaluation of a on the vertices in e. 

Algorithmic task: Determine the labels of the vertices given the hypergraph, and do so with an 
efficient algorithm at the smallest possible edge density p. 

Usually Q will be symmetric in the sense that Q{x) depends only on the number of +l’s in the 
vector X, and in this case we can view hyperedges as unordered. We assume that Q is not identically 
2 “^ as this distribution would simply be uniform and the planted partition would not be evident. 

Planted /c-satisfiability is defined similarly: we fix an assignmenf u fo n boolean variables which 
induces a parfifion of the set of 2n literals (boolean variables and their negations) into true and 
false literals. Then we add /c-clauses independently at random, with probability proportional to the 
evaluation of Q on the k literals of the clause. 

Planting distributions for the above problems are classified by fheir distribution complexity, r = 
min5^0{|5| : Q{S) / 0}, where Q{S) is fhe discrefe Fourier coefficienf of Q corresponding fo the 
subset 5 C [A:]. This is an integer between 1 and k, where k is the uniformity of the hyperedges or 
clauses. 

A consequence of Theorem 1 is the following: 

Theorem 5. There is an efficient algorithm to detect the planted partition in the random k-uniform 
hypergraph partitioning problem, with planting function Q, when 

. N 1 

p > (1 + e) min -^-—— 

^sc[fc] 22fcQ(5)2n^-|S|/2 

for any fixed e > 0. Similarly, in the planted k-satisfiability model with planting function Q, there 
is an efficient algorithm to detect the planted assignment when 

/ , 1 

p > (1 + e) min -^-—r^- 

SQ[k]2‘^^Q{SY{2nf-\^\/‘^ 

In both cases, if the distribution complexity of Q is at least 3, we can achieve full recovery at the 
given density. 

Proof Suppose Q has distribution complexity r. Fix a set 5 C [k] with Q{S) / 0, and |5| = r. The 
first step of the reduction of [19] transforms each A:-uniform hyperedge into an r-uniform hyperedge 
by selecting the vertices indicated by the set S. Then a bipartite block model is constructed on vertex 
sets Ui, V 2 > with Vi the set of all vertices in the hypergraph (or literals in the formula), and V 2 the 
set of all (r — l)-tuples of vertices or literals. An edge is added by taking each r-uniform edge and 
splitting it randomly into sets of size 1 and r — 1 and joining the associated vertices in Vi and ¥ 2 - 
The parameters in our model are ni = n and 77-2 ~ (considering ordered (r — l)-tuples of 
vertices or literals). 

These edges appear with probabilities that depend on the parity of the number of vertices on one 
side of the original partition in the joined sets, exactly the bipartite block model addressed in this 
paper; the parameter 6 in the model is given by <5 = 1 + 2^Q{S) (see Lemma 1 of [19]). Theorems 
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1 then states that detection in the resulting block model exhibits a sharp threshold at edge density p* 
, with p* = 2 kA!c \2 k-r/ 2 - difference in bounds in Theorem 5 is due to the two models having 
n vertices and 2n literals respectively. 

To go from an e-correlated partition to full recovery, if r > 3, we can appeal to Theorem 2 of [7] 
and achieve full recovery using only a linear number of additional hyperedges or clauses, which is 
lower order than the 0(n^/^) used by our algorithm. □ 

Note that Theorem 2 says that no further improvement can be gained by analyzing this particular 
reduction to a bipartite stochastic block model. 

There is some evidence that up to constant factors in the clause or hyperedge density, there may 
be no better efficient algorithms [.35, 18], unless the constraints induce a consistent system of linear 
equations. But in the spirit of [15], we can ask if there is in fact a sharp threshold for detection of 
planted solutions in these models. In one special case, such sharp thresholds have been conjectured: 
[25] have conjectured threshold densities based on fixed poinfs of belief propagafion equations. The 
planfed fc-SAT disfribufions covered, however, are only fhose wifh disfribufion complexify r = 2: 
fhose fhaf are known fo be solvable wifh a linear number of clauses. We ask if fhere are sharp 
fhresholds for defecfion in fhe general case, and in particular for fhose disfribufions wifh disfribufion 
complexify r > 3 fhaf cannof be solved by Gaussian elimination. In particular, in fhe case of fhe 
parify disfribufion we conjecfure fhaf fhere is a sharp fhreshold for defecfion. 

Conjecture 1. Partition a set of n vertices at random into sets A, B. Add k-uniform hyperedges 
independently at random with probability 6p if the number of vertices in the edge from A is even 
and (2 — 5)p if the number of vertices from A is odd. Then for any 5 G (0, 2) there is a constant 
cs so that p = c^n~^l‘^ is a sharp threshold for detection of the planted partition by an efficient 
algorithm. That is, ifp > (1 + e)c^n~^l‘^, then there is a polynomial-time algorithm that detects the 
partition whp, and ifp < csn~^^‘^ then no polynomial-time algorithm can detect the partition whp. 

This is a generalizafion fo hypergraphs of fhe SBM conjecfure of [15]; fhe k = 2 parity disfribu- 
fion is fhaf of fhe slochasfic block model. We do nol venfure a guess as fo fhe precise consfanf cs, 
buf even a heurisfic as fo whaf fhe consfanf mighf be would be very inferesfing. 

3.1. Relation to Goldreich’s generator. [20] ’s pseudorandom generator or one-way function can 
be viewed as a variant of planted satisfiability. Fix an assignment cr to n boolean variables, and fix a 
predicate P : {±1}^ —)• {0,1}. Now choose m fe-tuples of variables uniformly at random, and label 
the fc-tuple with the evaluation of P on the tuple with the boolean values given by cr. In essence 
this generates a uniformly random /c-uniform hypergraph with labels that depend on the planted 
assignment and the fixed predicate P. The task is to recover a given this labeled hypergraph. The 
algorithm we describe above will work in this setting by simply discarding all hyperedges labeled 0 
and working with the remaining hypergraph. 

4. Related work 

The stochastic block model has been a source of considerable recent interest. There are many al¬ 
gorithmic approaches to the problem, including algorithms based on maximum-likelihood methods 
[37], belief propagation [15], spectral methods [30], modularity maximization [6], and combina¬ 
torial methods [9], [16], [24], [13]. [12] gave the first algorithm to detect partitions in the sparse, 
constant average degree regime. [15] conjectured the precise achievable constant and subsequent 
algorithms [29, 31, 8, 2] achieved this bound. Sharp thresholds for full recovery (as opposed to 
detection) have been found by [33, 1, 21]. 

[7] used ideas for reconstructing assignments to random 3-SAT formulas in the planted 3-SAT 
model to show that Goldreich’s construction of a one-way function in [20] is not secure when the 
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predicate correlates with either one or two of its inputs. For more on Goldreich’s PRG from a 
cryptographic perspective see the survey of [3]. 

[19] gave an algorithm to recover the partition of Vi in the bipartite stochastic block model to 
solve instances of planted random k-SAT and planted hypergraph partitioning using subsampled 
power iteration. 

A key part of our analysis relies on looking at an auxiliary graph on Vi with edges between 
vertices which share a common neighbor; this is known as the one-mode projection of a bipartite 
graph: [40] give an approach to recommendation systems using a weighted version of the one¬ 
mode projection. One-mode projections are implicitly used in studying collaboration networks, for 
example in [34] ’s analysis of scientific collaboration networks. [26] defined a general model of 
biparfife block models, and propose a community defecfion algorifhm fhaf does nol use one-mode 
projecfion. 

The behavior of fhe singular vecfors of a low rank recfangular mafrix plus a noise mafrix was 
sfudied by [4]. The selling Ihere is differenl: fhe rafio belween ni and n 2 converges, and fhe enlries 
of fhe noise mafrix are mean 0 variance 1. 

[10] and [22] bofh consider fhe case of recovering a planled submalrix wilh elevaled mean in a 
random recfangular Gaussian mafrix. 

Nolalion. All asymplofics are as ni —)• oo, so for example, ‘E occurs whp’ means lim Pr(S) = 1. 

ni^oo 

We write /(ni) = 0{g{ni)) and /(ni) = (l{g{ni)) if fhere exisl conslanfs C,c so fhaf /(ni) < 
C'log'^(ni) • g{ni) and f{ni) > g{ni)/{Clog^{ni)) respeclively. For a vector, ||r;|| denotes fhe 
I 2 norm. For a mafrix, ||A|| denotes fhe speclral norm, i.e. fhe largesl singular value (or largesl 
eigenvalue in absolule value for a square mafrix). For ease of reading, C will always denole an 
absolufe conslanl, bul fhe value may change during fhe course of fhe proofs. 

5. Proof OF Theorem 1: detection 

In Ibis section we prove Theorem 1, giving an oplimal algorifhm for defecfion in fhe biparfife 
slochaslic block model when n 2 = w(ni). The main idea of fhe proof is fhaf almosl all of fhe 
information in fhe biparfife block model is in fhe subgraph induced by Vi and fhe verlices of degree 
fwo in ¥ 2 - From Ihis induced subgraph of fhe biparfife graph we form a graph G' on Vi by replacing 
each palh of lengfh fwo from Vi fo V 2 back to Ity wifh a single edge belween fhe fwo endpoinfs in 
Vi- We Ihen apply an algorifhm from [29, 31], or [8] to defecl fhe partition. 

Proof of Theorem 1. Fix e > 0. Given an inslance G of fhe bipartite block model wilh 

p= {1 + e){S - 

we reduce to a graph G' on Vi as follows: 

• Sorl V 2 according lo degrees and remove any vertices (wilh Iheir accompanying edges) 
which are nol of degree 2. 

• We now have a union of 2-edge palhs from vertices in Vi lo vertices in V 2 and back to 
verlices in Ity. Creale a mulfi-sef of edges S on Vi by replacing each 2-pafh u — v — w by 
fhe edge {u, w). 

• Choose N from fhe disfribulion Poisson((l -|- e)((5 — l)“'^ni/2). 

• If ty > \E\, Ihen stop and oufpul ‘failure’. Olherwise, selecl N edges uniformly al random 
from 8 lo form fhe graph G' on Vi, replacing any edge of mulliplicily greater fhan one wifh 
a single edge. 

• Apply an SBM algorifhm to G' fo parlifion Vi. 

We now delermine fhe disfribulion of G' condifioned on a. Lei /3i be fhe bias of -|-1 labels in a, 
/5i = Ylu&Vi Conditioned on /3i, fhe degrees d{vi ),..., d{vn 2 ) of fhe vertices of V 2 are 
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independent, identically distributed random variables. Let Yi = d{vi). Under the high probability 
event that j3i = o{n^ ' ), we can compute 

Pr[yj = 2|o-,/3i = o(re7^'^^)] = ^^(1 + o(l)) 

and so whp, \£\ = + o(l)), and N < \£\. Note that it is only at this step that we require 

the assumption that n 2 = io{ni). 

Conditioned on a, the edges in £ are independent and identically distributed, with distribution of 
a given edge e = {u, v) as 


Pa = Pr[e = {u, v)\a{u) = a{v)] 

_ _ + _ 

('((/3i+l)ni/2^ ^ ^ ^ _ l)(n2/4)5(2 - ,5) 

Pb = Pr[e = (u, u)|cr(t6) 7 ^ o-(u)] 

__ .5(2 - ,5) _ 

“ (^((/5i + lUi/2) + 5^+(2-S)^ + _ l)(n2/4).5(2 - .5) 

When /?! = 


Pa = 


2-25 + 6^ 


n. 


r(l + o(l)) and Pb = 


26-6^ 4 


nt 


(1 + 0 ( 1 )). 


By Poisson thinning this means that the number of times each ++ or-edge appears in the 

subsampled collection of edges is a Poisson of mean PaMN, and each H— edge according to a 
Poisson of mean pfeEiV, and all of these edge counts are independent. 

Now define a so that Pr[Poisson(pa • EA^) > 1] = and b so that Pr[Poisson(pb • EiV) > 1] = 

_b_ 
ni ■ 

From the construction above, conditioned on a the distribution of G' is that of the stochastic 
block model on Vi with partition cr: each edge interior to the partition is present with probability 
a/ni, each crossing edge with probability b/rii, and all edges are independent. 

For a such that /3i = o(n“^/^), we have 


a = 

b = 


(1 + e )(2 — 25 + 5 ^) 

(l + e)(25-52) 


(1 + 0 ( 1 )) 


(5-1)^ 


(1 + 0 ( 1 )) 


For these values of a and b the condition for detection in the SBM, (o — 6)^ > (1 + e)2(a + b) 
is satisfied and so whp the algorithms from [29, 31,8] will find a partition that agrees with cr on 
1/2 + e' fraction of vertices. □ 


6 . Proof OF Theorem 2: IMPOSSIBILITY 

The proof of impossibility below the threshold (a — b)^ = 2(o + b) in [32] proceeds by showing 
that the log n depth neighborhood of a vertex p, along with the accompanying labels, can be coupled 
to a binary symmetric broadcast model on a Poisson Galton-Watson tree. In this model, it was shown 
by [17] that reconstruction, recovering the label of the root given the labels at depth R of the tree, 
is impossible as i? —>• cx), for the corresponding parameter values (the critical case was shown by 
[36]). 
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In the binary symmetric broadcast model, the root of a tree is labeled with a uniformly random 
label +1 or —1, and then each child takes its parent’s label with probability 1 — rj and the opposite 
label with probability rj, independently over all of the parent’s children. The process continues in 
each successive generation of the tree. 

The criteria for non-reconstruction can be stated as (1 — 2rj)‘^B < 1, where B is the branching 
number of the tree T. The branching number is B = pc{T)~^, where pc is the critical probability 
for bond percolation on T (see [28] for more on the branching number). 

Assume first that n 2 ~ cni for some constant c, and that p = d/ni. Then there is a natural 
multitype Poisson branching process that we can associate to the bipartite block model: nodes of 
type 1, corresponding to vertices in Vi, have a Poisson(c(i) number of children of type 2; nodes 
of type 2, corresponding to vertices in V 2 , have a Poisson(fi) number of children of type 1. The 
branching number of this distribution on trees is ^/c ■ d, an easy calculation by reducing to a one- 
type Gabon Watson process by combining two generations into one. Transferring the block model 
labeling to the branching process gives 7 / = 5/2, and so the threshold for reconsttuction is given by 

(5 - if^fcd < 1 

or in other words, 

1 

^ ~ {5 - l)2^nin2 

exactly the threshold in Theorem 2. In fact, in this case the proof from [32] can be carried out in 
essentially the exact same way in our setting. 

Now take n 2 = io{ni). A complication arises: the distribution of the number of neighbors of 
a node of type 1 does not converge (its mean is n 2 P —)■ 00), and the distribution of the number of 
neighbors of a node of type 2 converges to a delta mass at 0. But this can be fixed by ignoring the 
vertices in V 2 of degree 0 and 1. Now we explore from a vertex p £ Vi, but discard any vertices 
from V 2 that do not have a second neighbor. We denote by G the subgraph of G induced by Vi and 
the vertices of V 2 of degree at least 2. Let T be the branching process associated to this modified 
graph: nodes of type 1 have Poisson(d^) neighbors of type 2, and nodes of type 2 have exactly 1 
neighbor of type 1, where here p = d /The branching number of this process is d, and the 
reconstruction threshold is (5 — l)^d < 1, again giving the threshold p < jp!-^===, as required. 

As in [32], the proof of impossibility will show the stronger statement that conditioned on the 
label of a fixed vertex w £ Vi and the graph G, the variance of the label of another fixed vertex 
p tends to 1 as ni —)■ 00. The proof of this fact has two main ingredients: showing that the depth 
R neighborhood of a vertex p in the bipartite block model (with vertices of degree 0 and 1 in V 2 
removed) can be coupled with the branching process described above, and showing that conditioned 
on the labels on the boundary of the neighborhood, the label of p is asymptotically independent of 
the rest of the graph and the labels outside of the neighborhood. We will use the notation from 
Section 4 of [32] and indicate the places in which our proof must differ; the most significant is that 
we must show that the vertices of degree 0 and 1 in 1^2 give essentially no information about the 
label of p. 

First note that in Proposition 4.2 from [32], R = 0(log n), but for the proof of Theorem 2.1 all 
that is required is i? = w(l). We choose 

-R = ^ • min{logni,log(n 2 /ni)} = a;(l). 

Let T be the branching process described above, starting with a root of type 1. We will denote 
the labeling functions of nodes of type 1 and type 2 in T by d and r respectively. We will consider 
two steps of the exploration process at once, so the depth 0 neighborhood is p itself, the depth 1 
neighborhood is p, its neighbors (of degree at least 2), and the neighbors of these neighbors. The 
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depth r neighborhood then includes those vertices in Vi at distance 2r from p. Let Gr be this depth r 
neighborhood in G, and ar,Tr be the labelings of Vi and V 2 restricted to the vertices in Gr- Define 
Tr, (T, f as the same objects for the tree process. Let d^Gr, d^Tr be the set of vertices from Vi, 
nodes of type 1, in the last layer, and d'^Gr, d^Tr the vertices from V 2 and nodes of type 2 in the 
last layer. Let Vr = ViUV 2 \V(Gr)- We will show: 


Lemma 1. With R as above, there is a coupling so that {Gr, ur, tr) = {Tr, aR, tr) whp. 


Proof. T can be constructed by three sequences of independent random variables rsj Poisson(d^(5/2), 
~ Poisson(d^(2 — 6)/2), for u of type 1 in T, and Xy ~ Bemoulli((i/2) for v of type 2 in T. 

To create the branching process, we start with the root p of type 1, and assign it +1 or —1 label 
at random. We then assign it Yp type-2 children of the same label and Yp type-2 children of the 
opposite label. All together the number of children has a Poisson(d^) distribution, and the labels 
are selected independently to agree with p with probability 5/2 and to disagree with probability 
1 — 5/2. Now each child v of type 2 has exactly one child of its own, whose label agrees if Xy = 1 
and disagrees otherwise. Then the process continues inductively. 

Now consider exploring the depth R neighborhood of p in G. We index the vertices from Vi in 
the order in which we encounter them in this breadth-first exploration. We explore two layers of the 
neighborhood at once: the active vertex Ui will always be from Vi. To explore from Ui, we reveal all 
edges from Ui to unexplored vertices in V 2 ; call these neighbors N{ui). We then set all v G N{ui) 
to be explored, and query all edges from N{ui) to unexplored vertices in Vi; call these vertices 
N‘^{ui), as they are all connected by a path of length 2 to m in G. Set all vertices in N‘^{ui) to 
explored, and place them in a FIFO queue of vertices. Then set Ui to dead, and take the next vertex 
Ui-^-l from the queue, set to active, and repeat. 

Let Naa{ui), Nab{ui), Njjaiui), Ni,i,{ui) be the number of paths of length 2 from Ui to an unex¬ 
plored vertex in Vi , with the subscripts denoting whether the labels along the path agree or disagree 
with the label of up, e.g. if a{ui) = -|-1, then Nbaiui) is the number of paths that go through a vertex 
V £ V 2 with label —1 and then to an unexplored vertex w £ Vi with label -|- 1 . Let {i), Vf{i) 
be the number of unexplored vertices in Vi with the respective labels at the moment m becomes 

u&Vi 


active, and likewise for (f), V 2 (i). If we condition on the bias of a and r, /3i = — 


and /32 


1 

n2 


E 

veV2 


t{v), then at each step of the exploration, the distribution of the A^**’s depends 


only on the V* ’s. 

As in [32], let Ay be the event that no vertex in I^r-i has more than one neighbor in Gy-i, let 
By be the event that no vertex in d^Gy has more than one neighbor in d^Gy, also define fhe event 
Dy = {d{v) < 2 for all v £ d'^Gy}. Then analogously to Lemma 4.3 in [32], we have 


Lemma 2. If 

(1) {Gy—\,G, -- {Ty—\,(J, -IjTV—i). 

(2) For every u £ d^Gy-i, Naa{u) + Nab{u) = Yf/ = Poisson((i2(f/2); Nba{u) + Nbb{u) = 
Yy = Poisson((i^(2 — 5)/2). 

(3) For every u £ d^Gy-i, 

^ Xy = Naa{u) 

V^d‘^Gr-l,V^U 

cr(u)=(T(v) 


Y, Xy = Nbbiu) 

v^d^Gr-i^v^u 

(T{u)^a{v) 
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(4) Ar, Br, Dr hold. 

Then {Gr,Crr,Tr) = {Tr,ar,Tr)- 

Next we define Cr = {|i9Gs| < mm{logni, log(n 2 /ni)}, Vs < r + Ij.Then, 

Lemma 3. Whp, Ar,Br, Cr, and Dr hold for all 1 < r < R and 
|Gr| = 0(min{nj/®, (n2/ni)^/®}). 

Proof. As in Lemma 4.4 from [32], stochastic domination and a Chernoff bound show that Ar, Br, Cr 
hold whp: the distribution of N‘^{ui) is dominated by a Bin(ni, 4d^/ni). If u G V 2 is revealed to 
be a neighbor of Ui, then the probability it has at least 2 additional neighbors in Vi is bounded by 
0{n\p^) = 0(ni/n2). Given that |Gh| = 0(min{nj'^®, (n 2 /ni)^/®}), a union bound gives that 
Dr holds for all 1 < r < i? whp. □ 

Finally, we complete the proof of Lemma 1. Condition on the event that /3i = 0(n^ ' ) and (32 = 
0{n2 which occurs with probability > 1 — exp(—0(nj'^^)). Condition also on the event that 
the number of edges incident to all explored vertices in Vi is at most 2 n 2 pmin{nJ^®, (n 2 /ni)^/®}. 
Under these two events we have |Ui^(f)|, |V^i~(f)| = n-i/2(l + and |U 2 ^(i)|, |U 2 “(i)| = 

722 / 2(1 + 0{nf^^^)) for all 1 < 7 < i?. 

For Ui G Vi, the distribution of the number of its unexplored neighbors of degree 2 and label +1 is 
Bin(|U2+(f)| and the distribution of the number of its unexplored neighbors of degree 2 

and label —1 is Bin(|U 2 ~(f)|)P'^*'“*^’~) where 

p+>+ = \ V^{i)\6^p^{l — — (2 — *-*^1 

+ \Vf{i)\6{2 - 5)p^{l - - (2 - 

= ni5p^{l + 

and likewise 

p~~ = ni 5 p^{l + 0 ( 77 ,/^'^^)) 
p'^~,p~'^ = ? 2 i (2 — 5)p^{l + 0 ( 72 /^^^)). 

From Lemma 4.6 in [32], we then have that for cr(72) G {±1}, 

||Bin(|U2"^“^(f)|,F"^“^’"^“^) -Poisson((i25/2)||rv' = 0{n~^'^) 
||Bin(|U2“"^“^(f)|,F"^“^’""^“^) - Poisson(d2(2 - 5)/2)\\tv = 0{n~^'^). 

/s 1 /8 

Since we have |Gi?| < 72 ^' whp, a union bound over r = 1,..., i? shows that there is a coupling 
so that whp for all r < i? and every u G d^Gr-i, Naa{u) + Nab{u) = Nba{u) + Nbb{u) = Yu- 

Next, we show that the probability the second neighbor of a vertex of degree 2 in V 2 has the same 
label is close to 6/2. Let Ui be the current active vertex, v a neighbor of Ui of degree 2, and w the 
second unexplored neighbor of v. Then 

Pr[(T(2(;) = T{v)\ui ~ V, d{v) = 2] = 6/2 + 

This shows that the coupling can be extended to the Xfs, and that whp under this coupling 

{Gr,(Tr,tr) = {TR,dR,fR). 


□ 
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Let V 2 ^'^ and be the subsets of V 2 of degree 0 and 1 respectively. Let be the vertices 

^ (> 2 ) 

of V 2 of degree at least 2. Recall that G is the subgraph of G induced by Vi and V 2 . From G we 
can determine the set V 2 ~^'^ = U but not the two sets individually. 

The following is an analogue of Lemma 4.7 in [32]. It says that conditioned on the labels at 
depth R in G from the root p, neither the graph outside the 7?-neighborhood, nor the vertices in 
V 2 ~^^ contain significant information about the label of p. 

Lemma 4. Let A, B,G be a partition ofVi U so that B separates A and C in G. Assume 
\Ayb B\ = o(yTii). Then 

Pr[(TA, taWbvjc, tbuC, G] = (1 + 0 ( 1 )) Pr[(TA, taIctb, tb, G] 
whp over G, a and r. 

We delay the proof of Lemma 4 to the Appendix. 


Now, we can finish the proof of Theorem 2. By the monotonicity of conditional variance, 
Vai{a{p)\G,a{w),aQ^^^,Tg 2 Gj ^ yai{a{p)\G,a{w)). 

Then whp w ^ Gr, and so by Lemma 4, 

Var(cj(p)|G, cr(w^), ^ Var(o-(p)|G, 

(since a{p) and a{w) are independent given G, ''■92^^)- By Lemma 1, 


Var(a(p)|G,agi^^,rg2G^) - Var(<T(p)|r,dair^,fa2r^) 


0 . 


From the results of [17] and the condition p < 

Var{d{p)\T,aQiTi^,fg 2 B^) = 1 -o(l). 


Thus, whp 

\ar{a{p)\G,aQi^^,Tg2Gj = 1 - o(l) 

as well. This implies that the labels of p and w are asymptotically independent and in particular 
proves Theorem 2. 


7. Proof of Theorem 3: Recovery 

We will follow a similar framework to prove both parts of Theorem 3. Recalling M to be the 
adjacency matrix, let B = MM^ — diag(MM^) and Dy = diag(MM^). 

A simple computation shows that the second eigenvector of KB is the vector a that we wish 
to recover; we will consider the different perturbations of KB that arise with the three spectral 
algorithms and show that at the respective thresholds, the second eigenvector of the resulting matrix 
is close to cr. To analyze the diagonal deletion SVD, we must show that the second eigenvector of 
B is highly correlated with a (the addition of a constant multiple of the identity matrix does not 
change the eigenvectors). The main step is to bound the spectral norm ||i7 — Ei7||. Since the entries 
of B are not independent, we will decompose B into a sequence of matrices based on subgraphs 
induced by vertices of a given degree in ¥ 2 - This (Lemma 5) is the most technical part of the work. 

To analyze the unmodified SVD, we write MM'^ = KB + {B — KB) + KDy + {Dy — KDy). 
The left singular vectors of M are the eigenvectors of MM^. KB has a as its second eigenvector 
and KDy is a multiples of the identity matrix and so adding it does not change the eigenvectors. As 
above we bound \\B — Ei7|| and what remains is showing that the difference of the matrix Dy with 
its expectation has small spectral norms at the respective thresholds; this involves simple bounds on 
the fluctuations of independent random variables. 
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We will assume that a and r assign +1 and —1 labels to an equal number of vertices; this allows 
for a clearer presentation, but is not necessary to the argument. We will treat a and r as unknown 
but fixed, and so expectations and probabilities will all be conditioned on the labelings. 

The main technical lemma is the following: 


Lemma 5. Define B, Dy as above. Assume m, n 2 , and p are as in Theorem 3. Then there exists 
an absolute constant C so that 

(1) KB = XiJ/rii + \ 2 (ya^ jnx, with Ai = nin 2 P^ and A 2 = (<5 — l)^nin 2 P^, where J is the 
all ones ni x ni matrix. 

(2) Forp > logni, \\B — Ei3|| < w/ip. 

(3) KDy is a multiple of the identity matrix. 

(4) Forp > logni, ||iAy —EiAy|| < C^/nfpTogni whp. 

This is proved in Appendix C. 

We also will use the following lemma from [27] to round a unit vector with high correlation with 
cr to a ±1 vector that denotes a partition: 


Lemma 6 ([27]). For any x G {—1, +!}"■ and y G M"' with ||y|| = 1 we have 


d{x, sign{y)) < n 



where d represents the Flamming distance. 


The next lemma is a classic eigenvector perturbation theorem. Denote by Pa{S) the orthogo¬ 
nal projection onto the subspace spanned by the eigenvectors of A corresponding to those of its 
eigenvalues that lie in S. 


Lemma 7 ([14]). Let Abe annxn symmetric matrix with | Ai| > | A 2 I > • • •, with |Afc| — |Afc+i| > 
26. Let B be a symmetric matrix with ||i7|| < 6. Let Ak and {A + B)k be the spaces spanned by 
the top k eigenvectors of the respective matrices. Then 

sm{Ak,{A + B)k) = llPAfc -P{A+B)J\ < 

In particular, ^|Ai| — IA 2 I > 26, IA 2 I — IA 3 I > 26, ||P|| < <5, and e 2 (A), e 2 (A + B) are the 
second (unit) eigenvectors of A and A + B, respectively, satisfying 62 (A) ■ e 2 {A + B) > 0, then 

\\e2{A)-e2(A + B)\\<il^. 

Proof. In the particular case, let ui,U 2 be the first two eigenvectors of A, and vi,V 2 the first two 
eigenvectors of A + P, with signs chosen so that ui ■ vi,U 2 ■ V 2 > 0. Let e = ||P||/5. First we 
apply the lemma with A: = 1 to get || < e. Applying this to vi, we get HPujUi — || < e, 

and so ||P„jr;i|| > 1 — e. The triangle inequality gives ||tti — ui|| < 2e. Now apply the lemma with 
k = 2 to get ||Pa 2 ~ P(A+s) 2 ll ^ -^PPly this to V 2 and use the triangle inequality again to get 
\\u 2 — V 2 \\ < 4e. □ 


Now using Lemmas 5, 6, and 7 we prove parts 1 and 2 of Theorem 3. 

Diagonal deletion SVD. Letp > ' 71,2 ' log rii. Part 1 of Lemma 5 shows that if we had access 

to the second eigenvector of EP, we would recover a exactly. (The addition of a multiple of the 
identity matrix does not change the eigenvectors). Instead we have access to P = EP + (P — EP), 
a noisy version of the matrix we want. We use a matrix perturbation inequality to show that the top 
eigenvectors of the noisy version are not too far from the original eigenvectors. 
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Let yi and y 2 be the top two eigenvectors of B, and B be the space spanned by yi and y 2 , and 
(ES )2 the space spanned by the top two eigenvectors of KB. Then Lemma 7 gives 


sm{{KB) 2 ,B) < 


C\\B-K B 
A 2 


1/2 1/2 


E < C 


n^ n 


2 P 


{5 — l)^nin2p^ 


= O 


1 


log ni 


where the inequality holds whp by Lemma 5. Assuming S G (0, 2), we use the particular case of 
Lemma 7 to show that \\y 2 — cr/v^ll ~ 0(log“^ ni). We round 2/2 by signs to get z, and then 
apply Lemma 6 to show that whp the algorithm recovers 1 — o(l) fraction of the coordinates of a. 
(If (5 = 0 or 2, then instead of taking the second eigenvector, we take the component of B perpen¬ 
dicular to the all ones vector and get the same result). 


_ 2/3 — 1/3 

The SVD. Let p>n.^ n 2 log ni. Let yi and y 2 be the top two left singular vectors of M, and 
M 2 be the space spanned by yi and y 2 . yi and 2/2 are the top two eigenvectors of = B + Dy. 

Again Lemma 7 gives that whp, 


sin((E,B)2,M2) < C- 


\B-KB\ 


+ \\Dv - KDvW ^ Cin\^'^ny‘^p + C 2 \/n 2 plogni 


A 2 


(5 — l)^nin2P^ 


= O 


1 

logni 


This gives \\y 2 — a/y/n{\\ = (7(log“^ n\), and shows that the SVD algorithm recovers a whp. Note 
that in this case \\Dv — Ei7y || » ||77 — Ei?||. It is these fluctuations on the diagonal that explain 
the poor performance of the SVD and its need for a higher edge density for success. 


8 . Proof of Theorem 4: Failure oe the vanilla SVD 


Here we again use a matrix perturbation lemma, but in the opposite way: we will show that the 
‘noise matrix’ {Dy — KDy) has a large spectral norm (and an eigenvalue gap), and thus adding the 
‘signal matrix’ approximately preserves the space spanned by the top eigenvalues. This shows that 
the top t eigenvectors of 77 + Dy have almost all their weight on a small number of coordinates and 
is enough to conclude that they cannot be close to the planted vector a. 

The perturbation lemma we use is a generalization of the Davis-Kahan theorem found in [5]. 


Lemma 8 ([5]). Let A and B be n x n symmetric matrices with the eigenvalues of A ordered 
Ai > A 2 > ... Xn- Suppose r > k, Xk — Xr > 26, and ||7?|| < 5. Let Ar denote the subspace 
spanned by the first r eigenvectors of A and likewise for {A + B)k- Then 


Pa^P{a+b)^ 


< 


B 


6 


In particular, if Vk is the unit eigenvector of {A + 77), then there is some unit vector u ^ Ar so 


that 


\U - Vk\\ < 


6 


Proof This lemma is a special case of Theorem VIL3.1 from [5], itself a generalization of the Davis- 
Kahan theorem. In the particular case, write Vk = where G Ar and G A/. Let 

e = 117711/(5. Then, by multiplying we get 


PAkP{A+B%Vk = PA^Vk = 

We see that < e, and thus > 1 — e. Take u = II use the triangle 

inequality to complete the lemma: u ^ Ar and ||ri — nfcH < 4e. □ 


We also need to analyze the degrees of the vertices in Vi. The following lemma gives some basic 
information about the degree sequence: 
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Lemma 9. Let di,... be the sequence of degrees of vertices in Vi. Then there exist constants 
Cl , C2 , C3 so that 

(1) The di’s are independent and identically distributed, with distribution di ~ Bin(n 2 / 2 , (5p) + 
Bin(n 2 / 2 , (2 - 5 )p). 

(2) Edj = n2p. 

(3) Whp, maxdj < n2P + ciy/n^plogni. 

i 

(4) Whp, \{i : di> n 2 P + C 2 \/nfp\ogn{]\ > 

(5) Whp, \{i-.di> n 2 P + c^y/nffAog\of/n^}\ < ni/logni. 

The lemma follows from basic Chernoff bounds and the first- and second-moment methods. Now, 
we can finish the proof of Theorem 4. 

—2/3 —1/3 

Proof of Theorem 4. Let p = cn^ ■ The left singular vectors of M are the eigenvectors of 

B + Dy. Recall that Dy is a diagonal matrix with the fih entry the degree of the ith vertex of Vi- 
¥,Dy is therefore a multiple of the identity matrix, and so subtracting EDy from B -|- Dy does 
not change its eigenvectors. The standard basis vectors form an orthonormal set of eigenvectors of 
Dy — KDy. 

For the constants C 2 , C 3 in Lemma 9, let rji = C 2 s/n 2 plogni and r ]2 = c^s/n 2 P log logni. Order 
the eigenvalues of Dy — EDy as Ai > A 2 > • • • > An and let r be the smallest integer such that 
Xr < r] 2 - Then we have Xi — Xr > Cy/n 2 P logni for all 1 < i < f. From Lemma 9, r < ni/ logni. 
We now bound 

||i?|| < ||Ei?|| -|- \\B — EB|| < nin2p‘^ + 

Now Lemma 8 says that if Vi is the ith eigenvector of Dy — KDy -|- B, then there is a vector u in 
the span of the first r eigenvectors of Dy — KDy so that 

2 , 1/2 1/2 / 1 \ 

II II . ^riin 2 P +n{ p 1 \ 

Vn 2 Flogni VVlogni J ' 

The span of the first r eigenvectors of Dy — Ei9y is supported on only r coordinates, so u is far 
from a = aj^fn{\ 

ll'u — (f|| > y ^2 — rjn\ = spl — 0 ( 1 /y/log ni). 

By the triangle inequality, Cj must also be far from (j: \vi-a\ = 0{\/y/logni). This proves Theorem 
4. □ 
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Appendix A. Proof of Lemma 4. 


Proof. 


Pr[ctA,TA\crBvjC-,TBvjc,G] = 


Pr[f7A, ta, GIctbuc, tbuc, G] 


PT[G\aBuC,TBuc, G] 


= Pr [( TA , taIctbuc, bbuc, G] ■ 

= Pr[aA, taIctbuc, tbuc, G] ■ 

Pr[G\aBuc,G] 

We now show that the last factor is 1 + o(l) whp over cr, r, and G. 


PrfGItJA, ta, ctbuc, tbuc, G] 
PT[G\aBUC,TBUC, G] 
Pr[G|t7,G] 


Lemma 10. Let U C Vi, and ajj the restriction of a to U. Then 

Pr[G|r7,G'] = (1 + o(l)) Pr[G|tTc/, G] 

whp over the choices of a, G. 

We leave the proof of Lemma 10 to Appendix B. 

To prove Lemma 4, it remains to show 


( 1 ) 


Pr[crA, ta\ctbuc, tbuc, G] = (1 + o(l)) Pr[(TA, ta\ctb,tb, G] 


whp over cr, r, G. Now that we have removed vertices of degree 0 and 1 from V 2 , the proof proceeds 
along the same lines as the proof of Lemma 4.7 of [32]. 

For u C Vi,v C V 2 , define 

6p , if (n, v) G E{G),a{u) = t{v) 

W , (r ^ (2 - &)p, if {u, v) G E{G),a{u) / t{v) 

' ’ \ — 6p fl^ {u,v) ^ E{G),a{u) = t{v) 

1 - (2 - (5)p,if {u,v) i E{G),a{u) / t{v). 


Define Qui,U 2 to b® the producf of 'fu,v{G, a, r) over all r G Gi, n G G 2 . Define 

Q“(G,(t,t) = Pr[A^g^^^^^(<i)e ^ G\a,T^(>2)]. 

We denote by rj and f labelings of Vi and V 2 respectively. rjA refers to the restriction of rj to the 
set An Vi, and so on. We write uab instead of aAuB for cleaner notation. Equation (1) is equivalent 
to 


Pr[G|crABC, Tabc] ^ . Pr[G|crAB, Tab] 


( 2 ) 


Pi[G\aBC,TBc] 


Pr[G|cJB,rB] 
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We rewrite the LHS of (2) as 

_ Q A,AB{(y ABC ■,TABc)Q ABC ■,'''ABc)Q BC,Bc{(^ ABC ■,TABc)Q~ {G, (J ABC-iTABc) _ 

Y^'n:riBC=^BC QA,AB{'nABC, 4 >ABc)Qa,c{'>1ABC, 4>ABc)QBC,B c{'nABC-, 4>ABc)Q~ {G, rjABC, 4>ABc) 
4>'-<I>bc=tbc 

_ _ Qa,Ab{ci'ABi Tab)Qa,c(o'AC, TAc)QBC,Bc{o'BC,TBc)Q~iG, CTABC^'TABc) _ 

YuV-mc=^BC QA,AB{'nAB, 4‘AB)QA,ci''lAC, 4 >Ac)QbC,Bc{o'BC,Tbc)Q~ {G, rjABC, 4>ABc) 
(t>'-<f>BC='rBC 

(3) 

__ Qa,ab{o'ab, Tab)Qa,c{'^AC-, TAc)Q~{G, cjabc, tabc) _ 

^V-VBC=o^BC QA,AB{r]AB, 4 >Ab)Qa,c{VAC, 4>Ac)Q~{G, rjABC, (j^ABc) 

(t>'-(t>BC=TBC 

This is a similar expression as encountered in the proof of Lemma 4.7 in [32] apart from the 
factors involving Q~. To address these factors we use the following Lemma: 

Lemma 11. Let U C ViU V 2 ~^^ so that \Vi U \ ?7| = o(nJ'^^). Then for any rj, f so that 
Vu = o-f/. (I>u = Tu, 

Q~{G,a,T) = {1 + o{l))Q~{G,rj,(j)) 

with probability 1 — o(l) over the choice of a, r, G. 

The proof is similar to that of Lemma 10, where we use the fact that is small to show that /3i 
is essentially determined on U. 

With Lemma 11, equation (3) becomes 


= (1 + 0 ( 1 ))- 


= ( 1 + 0 ( 1 ))- 


QA,AB{<rAB,TAB)QA,c{(^AC,TAc)Q {G, crABC,TABc) 

'f2^-'^BC=^BC QA,AB{riAB, 4 >Ab)Qa,c{i1AC, 4>Ac)Q~{G, crABC, tabc) 

<f>'-<l>BC=TBC 

_ Qa,ab{tab,tab)Qa,c{(tac,tac) _ 

'f2v-VBC=crBC Qa,Ab{'> 1AB, (t>AB)QA,c{TjAC, 4>Ac) 

0+SC=TSC 


with probability 1 — o(l). Now using (2) from [32] gives 


= ( 1 + 0 ( 1 ))- 


Qa,ab{tab,tab) 
^V-riBC=^BC QA,AB{riAB, 4 >Ab) 

<f>'-<f>BC=TBC 


with probability 1 — o(l). Now since Qa,ab does not depend on ac, tc, we have 


( 4 ) 


= (1 + 0 ( 1 )) 


Qa,ab{(tab,tab) 
Y)tVb=^b Qa,ab{iiab, 4>ab) 

4>'-4>b=tb 


Now we can proceed similarly with the RHS of (2): 
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^ri-VAB= 0 AB QA,ABi(^AB,TAB)QA,c{VACi 4>Ac)QBC,B c{VBC-, 4>Bc)Q {G, "HABC, (pABc) 
<I>'-4>ab=tab _ 

J2'n-r]B=(TB Qa,Ab{VAB, 4>AB)QA,ci'nAC, 4>Ac)QBC,Bc{r]BC , 4>Bc)Q~iG, r/ABC, (pABc) 

<j>'-4’B=TB 

QA,ABio'AB, tab) '^V-Vab=(^ab Qa,c{i1AC, <Pac)Q BC,Bc{'nBC, 4>Bc)Q~{G, rjABC-, 4>ABc) 
_ _ <t>'-(t>AB='rAB _ 

'^V.Vb=(^b QA,AB{'nAB, 4 'Ab)Qa,c{i1AC, 4‘Ac)QBC,Bc{VBC, 4>Bc)Q~{G, rjABC-, 4>ABc) 

<t>-4>B=TB 


= (1 + 0 ( 1 )) 

= (1 + 0 ( 1 )) 

= (1 + 0 ( 1 )) 

= (1 + 0 ( 1 )) 
(5) 

= (1 + 0 ( 1 )) 


Qa,Ab{(TAB,Tab) '^V-Vab=(^ab QBC,Bc{rjBC-, 4>Bc)Q (G, rjABC, (pABc) 

_ 4>-<t>AB='rAB _ 

Y2TVB=f^B QA,AB{riAB, (pAB)QBC,Bc{r]BC, 4>Bc)Q~iG, rjABC, (pABc) 

4‘'-4‘B='rB 

Qa,Ab{(TAB,TAb) '^V-'nAB=crAB QBC,Bc{rjBC,4>Bc)Q~ {riABC,4>ABc) 

_ 4>'-<t>AB='rAB _ 

^V-Vb=o-b QA,AB{r]AB, 4 >Ab)QBC,B cir]BC, 4 >Bc)Q~{(TA, r]BC,TA, (pBc) 

4>'-<t>B=TB 

QA,AB{crAB,TAB) J2 t-Vab=^ab QBC,BcirjBC, 4>Bc)Q~{G, rjABC, (pABc) 

_ 4>'-4>AB=TAB _ 

Y)T'nB=o-B QA,AB{rjAB,4‘AB) ' '^V-Vb=(tb Q BC,Bc{r]BC, 4 >Bc)Q~ {G, a A,rjBC ,TA, 4 >Bc) 

4>:<f)B='rB 4>'-<I>B=tb 

Qa,Ab{taB,TAb) '^V-Vab=^ab QBC,Bc{r]BC, 4‘Bc)Q~ {G, rjABC, 4>ABc) 
_ <t>'-4’AB='rAB _ 

'^V-Vb=o-b QA,AB{rjAB,4>AB) ' Y)t-Vab=(tab QBC,Bc{riBC, 4 >Bc)Q~ {G , rjABC, 4 >ABc) 

4>'-<t>B=TB 4>'-4>AB=TAB 

Qa,ab{(tab,tab) 

Y)T'nB=o-B QA,AB{rjAB, 4>Ab) 

4>'-<t>B=TB 


where we have again used Lemma 11 and (2) from [32]. Now (5) matches (4) and so 
Pr[(TA, 'taIo'buc, tbuC, G] = (1 + o(l)) Pr[o-A, ta\(Tb,tb, G\ 
with probability 1 — o(l) over cr, r, G, completing the proof of Lemma 4. 


□ 


Appendix B. Proof of Lemma 10. 

Proof. Note that conditioned on G and /3i, the distribution of G is that of independently choosing 
degree 1 or 0 for each n G P 2 of degree less than 2, with probability that depends only on /3i. Let 
n G ^ 2 - We can condition on /3i and compute 

Pr[d('(;) = 0|/?i] = ^ Pr[(i(n) = 0|/3i, r(n) = +1] + ^ Pic[d{v) = 0|/3i, r(n) = -1] 

~ 2 [Vl-(2-<5)py ^ V 1 - ) 

= (1 - 5p)^^/\i _ (2 _ 5p))"i/2(i + pfnlid - 1)V/2 + 0(/3fnfp")). 


Similarly, 
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FT[d{v) = l \/ 3 i ] = ^ Pr[d(i;) = l|^i, t(i;) = +1] + ^ Pr[fi(u) = 1\Pi,t{v) = -1] 

. i(i - - (2 - 

, ‘_ ,2 - 

= nip/2{l - - (2 - 1 - 


1-(2-6)p 


fiini/2 


c(p) + 


1 — Sp 


-/Sim 


d(p) 


= _c(p)(l + 


/3fnf(<5-l)V 


+ 0(/3fnfp^) ) + (c(p) - d(p)) 


l-(2-S}p 

l-6p ^ 


1 - (2 - d)p 


where r(n) — 1 (l“^i)(2-'5) _ (l+/3i)(2-5) , (l-/3i)<5 

Where c[p) - + (i_( 2 _ 5 p)) and a{p) - (■^_^2-S)p) + (i-5p) ■ 

Then we have 

Pr[d(n) = 0|d(n) < 1, /3i] = 


Pr[d(ti) = 0|/3i] 


Pr[d(n) = 0|/3i] + Pr[(i(n) = l|/3i] 

1 + /3fnf(6 — l)^p^/2 + 0(idfp‘^nf) 


O(nip) 


and 


Pr[d(?;) = l\d{v) < l,/3i] = 


c{p) (^1 + ^)V 0{/3fnfp^)^ + {d{p) - c{p)) (^j 

Pr[d(n) = l|/3i] 


i_Sp 
-{2-5)p 


Pr[d('i;) = 0|/3i] + Pr[(i(n) = l\l3i] 

2„2/r_i n2„2 


n\p 

2 


dp ) (1 + + 0(/3;ny)) + { d { p ) - c ( p )) (i^) 


-/3ini/2' 


0{nip) 


c{p) (^1 + ^ o{l3fnjp^)^ + {d{p) - c{p)) 


-/3ini/2 


NowPr[G|G,/3i] = Pr[d(n) = 0|d(n) < • Pr[d(n) = l\d{v) < 


Given a, /3i is determined, and whp over the choice of a, Similarly, whp over the 

choice of ajj, the conditional expectation of j3f is < . All together this gives that whp. 


Pr[G|d,g] 

Pr[G|(Tt;,G] 


(1 + 

(1 + 

1 + 0(n}^^n2 = 1 + 0(1) 


This proves Lemma 10. 


□ 


Appendix C. Proof of Lemma 5 

We use another auxiliary lemma, a high probability bound on the norm of a random matrix with 
mean 0 independent entries. Such a lemma is proved for Bernoulli random entries in [38, 1 1], here 
we extend it to Poisson entries. 
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Lemma 12. Let E be annxn symmetric random matrix with zeros on the diagonal and independent 
entries eij above the diagonal which take the values Xij — Xij where each Xij is a Poisson random 
variable with mean Xij. Then there is a constant C > 0, so that if (7^ '■= maxjj Xij, and cr^ > 
Clog n/n, 

Pr[||S|| >C-T-aVn] < n"^ 


for any T > 1. 


Proof If cr^ > \og^n/y/n, then we can apply Theorem 1.5 from [39] (the failure probability can 
be made as small as with the additional factor T in the bound). 

For < log^ n/y/n, we truncate each Xij by writing 


— ^ij 


+ X- 




where Xij = min{Xij, 1} and Xij = Xij — Xij. We then define two matrices E and E with Eij = 
Xij — KXij and Eij = Xij — KXij. Thus E = E + E, and so we will bound ||i?|| < ||.B|| + ||.C||. 

Note that EXij = EXjj(l + o(l)), and each Xij is a Bernoulli random variable, and so we can 
apply Lemma 3.4 from [38] to get ||.E|| < CTa^/n with probability > 1 — (again inspecting 
the details of the proof in [38] gives a failure probability of at the expense of the extra factor T 
in the bound). 


To bound ||C||, consider one row sum, 


Y..Xij-EX, 




For j = 1 


, n. 


the random variables 


Xij — EXij are independent, mean 0 random variable with variance O(cj^) with Poisson tails, and 
so a Chemoff bound gives 


Pr 




> CTay/n 


c'r2 

< exp (-^ 


cr^ 


^ / ^/ t -,2 s/n 

< exp —C T 


log^n, ’ 


and so with probability at least 1 — n ^ all row sums (and thus ||C||) are at most CTay/n. 


□ 


With this we prove Lemma 5. 

Proof of Lemma 5. First, note that under the conditions of Theorem 3, n 2 > ni log^ ni, so nip < 
1/ logni. (In fact, cases in which the density is much higher than this can be dealt with by the 
standard method of bounding \\M — EM||). 

(1) : We can compute (EB)ij: this is the expected number of paths of length 2 from i to j in G. 
Say a{i) = a{j), i / j. Then, 

{EB)ij = + !|(2 - 6)V = n2p\d^ -26 + 2). 

If a{i) / a{j), then 

{EB)^j = n2p‘^ {26 - 6^). 

The diagonal entries of EB are 0 by construction. So EB is a rank 2 matrix, EB = XiJ/ni + 
X 2 (yjnx, with Ai = nin 2 P^ and A 2 = (<5 — l)^nin 2 P^. 

(2) : The matrix B — EB is symmetric with mean zero entries, but the entries are not quite inde¬ 
pendent, and so we cannot directly apply a bound like Lemma 12. Instead, we will first decompose 
the matrix into the sum of a sequence of adjacency matrices of subgraphs induced by vertices of a 
given degree in V 2 - We will couple each matrix in the sum to a matrix with independent entries, 
apply Lemma 12 to each, then take the sum of these bounds as our upper bound on ||i? — Ei3||. 

(i) 

We decompose the graph G by sorting the vertices of V 2 by degree. Let V 2 % i = 1, 2 ... be the 
set of vertices in V 2 of degree i. 
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U\ 

Let Mi be the adjacency matrix of G induced by V 2 • The main idea of the decomposition is that 
Ml does not contribute to B, as its edges only appear on the diagonal of MM^, and that nearly all 
of the remaining edges in the graph are in M 2 . We write 

M = Ml + M 2 + Ms + ... 

and 

MM'^ = MiMf + M 2 MJ + M 3 MJ + ... 

The cross terms disappear since the matrices are supported on disjoint sets of columns. 

Recall B = MM^ — diag(MM^), and Bi = MiMf — diag(MjMj^). We have 

B = B 2 + B 3 + ... 

since MiM^ is a diagonal matrix. 

A vertex v G has exactly i neighbors, and its contribution to Bi is 1 in each entry {u, w) 
where n / m are neighbors of v. Call M„ the adjacency matrix induced by v, and By = MyMj — 
diagM^Mj. We have Bi = E By. By ho-S ^ 2 ^ 1 ^ 3.bovc the dio-gono.!. Now for e3.ch ^ ^ 3, 

/ x\ 

we randomly split each By, v G 1^^*^ into (*) symmetric matrices Bi^\ ..., By by randomly 
assigning each of the I’s above the diagonal to a unique matrix, along with the symmetric 1 below 
the diagonal. Then we combine the matrices as 

B “ = E s.®. 

Each Bi is the adjacency matrix of a random graph formed by adding a given number of random 
i-cliques to an empty graph. Clearly there are correlations between edges in such a graph, and so 
the purpose of this decomposition is to split the graph into ( 2 ) graphs, g\^\ j = 1... ( 2 ), with 

independent edges. This gives a sequence of adjacency matrices B^^\ ..., with identical 

distributions, but not independent across the different matrices. 

All together, we write 

00 ( 2 ) 

s +E E '5® 

i=3 j=l 

and 


( 6 ) 


B = EB + {B2 - EB2) + EE a 


i=3 j=l 




where 

00 ( 2 ) 

EB = EB2 + EE ebI^\ 

i=3 j=l 

What remains is to bound the norms of the mean zero random matrices in the decomposition 
above: \\B 2 — IEi? 2 ||? and the \\B^^'^ — Ei?p^||’s. 

(i) (i) 

Let Ly, L_ be the number of vertices of V 2 with degree i and label + or — respectively. As a 
first step, we calculate the expectations of these random variables: 
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(V) 


E\Lf\ = E|L®| = y Pr[d(u e V 2 ) = i] 
n 2 


= — Pr[Bin(ni/2, 5p) + Bin(ni/2, (2 - 5)p) = i] 
= y Pr[Poisson(ni|?) = i](l + o(l)) 


722 e '^^P{nipy 
~2 i\ 

n 2 {nipy 

2 • i! 


(1 + 0 ( 1 )) 


(1 + 0 ( 1 )) 


where we use the total variation distance bound on a Poisson approximation of a binomial, along 


with our assumption nip = o(l). Let io be the smallest i so that 


E|L«| 


< log 721. Considering the 


( 2 ) 


sum = EE , we see that the expected row sums of -B>io are bounded by 0(log ni), 
i=io i=i 

and so from a Chemoff bound, with probability at least 1 — ny'^, all row sums are 0(log72i). This 
gives 

00 ( 2 ) 00 ( 2 ) 

(8) EE -EB “’ll < EE llBp^ll + llEBp'^ll < Clogm < CnY^nl^^p. 


i=io i=l 


*=*o i=i 


with probability > 1 — 72 ^^. 

Nowconsideri < zq. Let A^('^^(++), -—) be the number of edges between ver¬ 

tices with the respective labels in the graph corresponding to B^^\ Conditioned on 
the edges are distributed uniformly (with replacement) in the respective categories. Alternatively, 
consider the adjacency matrices B 2 , b\^\ where each edge (tt, v) appears with multiplicity accord¬ 
ing to an independent Poisson random variable of mean (one of two values depending on 

whether u and v have the same or opposite label). Again in this setting, conditioned on the number 
of edges of each type, nI^\++) , 
replacement in the respective categories. 

Since E.B 2 = EJ 
large enough to get 

(9) 11^2 - E52|| < Cn\/^^Jm\Lf\/nl 

with probability at least 1 — 72j"®, as each entry has variance bounded by 4E| I/and similarly 

+/2 


of edges of each type, A^('^^(-|—h), iv('^^(-), iv('^^(-|—), the edges are distributed uniformly with 

latego 

(i) (i) 

Since Ei32 = EB 2 and EBf^ = EBf' by construction, we can apply Lemma 12 and choose C 


( 10 ) 


I 2 2 I 


< C ■ i ■ n 




4E|L 


(0i 


72t 


with probability at least 1 — 72 ^^ ® *. Note that from (7), the means E|l)|!^| decrease with i faster 
than l/i! and so summing (10) over 3 < 2 < 2o> and all j, gives a bound of \Je]iJ^\ < 


Crv^'^ny^p. 




To transfer these bounds, we couple the Poisson matrices with B 2 and the Bf^’s. If the means of 

A^('^^(++), -), —) are small enough, we can couple the matrices to be equal whp. If 

the means are large, we couple so that \\B^^'^ — B^^'^\\ is small. Take A^('^^(-|--|-). Its distribution is a 
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Bin(n 2 , q) for q that depends on ni,p, i, and 5. The corresponding random variable, is a 

Poisson(n 2 g). Say q = o(n 2 ^^). In this case the total variation distance between the two is 0 {n2q^) 
and so we can couple the corresponding matrices to be equal whp. q is decreasing like 1/i!, and so 
we can sum the deviation probabilities over all i and j. When q = Q,{n 2 we write 
as the sum of n 2 independent Ber(q) random variables and as the sum of n 2 independent 

Poisson(g) random variables, and term by term in each sum couple by an optimal coupling with 
respect to total variation distance. Then the difference — iV^^'^^(++) is the sum of n 2 

mean 0 random variables of variance O(q^), and so whp the difference is bounded by 
We can couple the matrices so that their difference has non-zero entries distributed uniformly in 
entries corresponding to +, + labels. Then, as above, a Chernoff bound shows that the row sums 
(and thus the matrix norm) of the difference matrix are all bounded by 0 {qnY^n2^^ logni). Since 
q < nfp^, this gives a bound on the norm of 0 {nin 2 ^^p‘^) = o{ny^n 2 ^‘^p). 

All together the bound (8) and the transferred bounds give 

\\B-EB\\ < Cny'^nl^'^p 

with probability at least 1 — which completes the proof of part 2 of Lemma 5. 

Parts 3 and 4 of Lemma 5 follow from the observation that the zth diagonal entries of Dy are 
the degrees of the ith vertex of Vi. Since the degrees of vertices of Vi have identical distributions, 
the expectation matrices are multiples of the identity. For part 4 we use Chernoff bounds. We have 

||Z:»V' - ^DvW = maxi\{Dv)ii - n 2 p\ < Cs/n^plog^ whp. □ 
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